Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation
نویسندگان
چکیده
We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, singlepass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over
منابع مشابه
Exponentially Weighted Simultaneous Estimation of Several Quantiles
In this paper we propose new method for simultaneous generating multiple quantiles corresponding to given probability levels from data streams and massive data sets. This method provides a basis for development of single-pass low-storage quantile estimation algorithms, which differ in complexity, storage requirement and accuracy. We demonstrate that such algorithms may perform well even for hea...
متن کاملStatistical methodology for massive datasets and model selection
Astronomy is facing a revolution in data collection, storage, analysis, and interpretation of large datasets. The data volumes here are several orders of magnitude larger than what astronomers and statisticians are used to dealing with, and the old methods simply do not work. The National Virtual Observatory (NVO) initiative has recently emerged in recognition of this need and to federate numer...
متن کاملEstimation of E(Y) from a Population with Known Quantiles
‎In this paper‎, ‎we consider the problem of estimating E(Y) based on a simple random sample when at least one of the population quantiles is known‎. ‎We propose a stratified estimator of E(Y)‎, ‎and show that it is strongly consistent‎. ‎We then establish the asymptotic normality of the suggested estimator‎, ‎and prove that it ...
متن کاملThe Beta-Weibull Logaritmic Distribution: Some Properties and Applications
In this paper, we introduce a new five-parameter distribution with increasing, decreasing, bathtub-shaped failure rate called the Beta-Weibull-Logarithmic (BWL) distribution. Using the Sterling Polynomials, various properties of the new distribution such as its probability density function, its reliability and failure rate functions, quantiles and moments, R$acute{e}$nyi and Shannon entropie...
متن کاملSimultaneous robust estimation of multi-response surfaces in the presence of outliers
A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistics and Computing
دوره 17 شماره
صفحات -
تاریخ انتشار 2007